This paper discusses our project using data on housing prices in Melbourne in 2017. The establishment of a real estate price prediction system is a key task for the healthy development of the current real estate industry. Having a simple predictive and inferential method to model housing prices helps commerce determine fair prices and allows governments to determine property taxes. This project aims to learn how different factors may affect home sales price by building linear models. Although data that will be utilized was collected in Melbourne, Australia in 2017, the concept that location and home attributes correlate with housing prices could reasonably apply broadly and internationally.
The following questions are the main subjects which this project focuses on:
The following are excerpts and graphs from our exploratory data analysis. This part of the project familiarizes the reader with our dataset’s attributes as well as lays the foundation for the variables will include in our linear model. The results of our EDA will also inform the future direction of the project.
Rooms: Number of rooms
Price: Price (AUS$)
Method: Method of sale - 5 categories
Type: House, Unit, Townhouse - 3 categories
SellerG: Real Estate Agent - 268 categories
Date: Date sold
Distance: Distance from Central Business District
Regionname: Region name - 8 categories
Propertycount: Number of properties that exist in the suburb
Bedroom2 : Number of Bedrooms
Bathroom: Number of Bathrooms
Car: Number of carspots
Landsize: Land Size
BuildingArea: Building Size
YearBuilt: Year home built
CouncilArea: Governing council for the area - 34 categories
Lattitude, Longtitude: GPS location
Suburb: Suburb name - 314 categories
Mean: $1,075,684
SD: 639310.724
| data.full$Price | |
|---|---|
| Min | 85000 |
| Q1 | 650000 |
| Median | 903000 |
| Mean | 1075684 |
| Q3 | 1330000 |
| Max | 9000000 |
\(H_0\): All means equal by group
All reject \(H_0\) with p-value\(<2\times 10^{-16}\)
# 4 Residual Analysis
##
## Call:
## lm(formula = Price ~ Rooms + Distance + Bathroom + Car + BuildingArea +
## Lattitude + Longtitude + Propertycount + factor(Regionname),
## data = data.train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2399657 -259926 -40715 178350 8109450
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) -1.41e+08 1.79e+07 -7.87
## Rooms 3.21e+05 8.97e+03 35.73
## Distance -4.44e+04 1.52e+03 -29.15
## Bathroom 1.71e+05 1.15e+04 14.94
## Car 5.59e+04 7.79e+03 7.17
## BuildingArea 4.21e+01 1.04e+01 4.04
## Lattitude -7.82e+05 1.29e+05 -6.04
## Longtitude 7.70e+05 1.21e+05 6.37
## Propertycount -3.64e+00 1.58e+00 -2.30
## factor(Regionname)Eastern Victoria 1.65e+05 1.07e+05 1.54
## factor(Regionname)Northern Metropolitan -5.90e+04 3.13e+04 -1.88
## factor(Regionname)Northern Victoria 5.40e+05 1.21e+05 4.47
## factor(Regionname)South-Eastern Metropolitan 1.54e+05 5.34e+04 2.89
## factor(Regionname)Southern Metropolitan 2.27e+05 2.83e+04 8.03
## factor(Regionname)Western Metropolitan -7.28e+04 4.02e+04 -1.81
## factor(Regionname)Western Victoria 5.07e+05 1.40e+05 3.62
## Pr(>|t|)
## (Intercept) 4.3e-15 ***
## Rooms < 2e-16 ***
## Distance < 2e-16 ***
## Bathroom < 2e-16 ***
## Car 8.4e-13 ***
## BuildingArea 5.5e-05 ***
## Lattitude 1.7e-09 ***
## Longtitude 2.0e-10 ***
## Propertycount 0.0212 *
## factor(Regionname)Eastern Victoria 0.1229
## factor(Regionname)Northern Metropolitan 0.0599 .
## factor(Regionname)Northern Victoria 8.0e-06 ***
## factor(Regionname)South-Eastern Metropolitan 0.0039 **
## factor(Regionname)Southern Metropolitan 1.2e-15 ***
## factor(Regionname)Western Metropolitan 0.0701 .
## factor(Regionname)Western Victoria 0.0003 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 452000 on 4942 degrees of freedom
## (4548 observations deleted due to missingness)
## Multiple R-squared: 0.531, Adjusted R-squared: 0.53
## F-statistic: 373 on 15 and 4942 DF, p-value: <2e-16
\[ \begin{equation} R^2 = 1- \dfrac{RSS}{TSS} \end{equation}=0.441\]